-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init commit optimistic relay #1
base: main
Are you sure you want to change the base?
Conversation
Agreed Option 2 is best :) A nice-to-have goal is for the optimistic relay code to be upstreamed to https://github.com/flashbots/mev-boost-relay, and for that we should avoid redefining things like |
Great work! I also agree on option 2. One question about point 2 of the implementation: I'm assuming the I'm also wondering if there's a possible edge case where there are optimistic blocks buffered in memory when the service is shut down (can happen whenever), and we end up never validating them. The worst-case block validation time can reach up to 6s. 95th percentile is way lower, but the distribution has a very long right tail so it does happen. Just thinking out loud. Excited to see this idea come to life :) |
Ok I added more to this implementation. I break down the changes into 3 main sequences: a) Block Submission Flow I will describe each of these in more detail and the changes required for them.
Notice that for builders in
This goroutine handles the simulation of all the blocks that we optimistically skipped in the
This flow represents the process of checking if our optimistic block processing ever results in a validator submitting an invalid block. Since block builders will post collateral, this will be used to reimburse the validator. Since refunds should be a relatively rare event, we plan on handling them manually when they occur. |
based on the advice of @JustinDrake, I added a check for the amount of collateral from the Builder, which is set in the database and redis. If their collateral is less than a block that the propose, then we fall back to high prio queue for that block. I chose to make this state change not permanent, because they might have just found one really profitable block, and it could turn out to be correct, so we shouldn't penalize them just because their collateral can't cover the block value. I am going to create a state machine diagram outlining the status and collateral fields of the builder in the redis cache and the db. One open question is how we get the redis cache updated with the collatoral values we set in the DB. Still need to iron this out a bit. |
also an additional note about the DB. It seems like the migration code is in place so that people who are already running the relay can include upstream changes and the DB can remain in place while just changing the fields. I modified the 001-initdb file directly, which would make these changes incompatible with relays that are already running. This might be something that we want to consider refactoring to make this backwards compatible, but the migration logic would be non-trivial. (e.g., we need to update the builder status based on the values of the existing rows rather than just dropping the column and adding a new one). |
@@ -286,6 +298,9 @@ func (api *RelayAPI) StartServer() (err error) { | |||
if api.opts.BlockBuilderAPI { | |||
// Get current proposer duties blocking before starting, to have them ready | |||
api.updateProposerDuties(bestSyncStatus.HeadSlot) | |||
|
|||
// TODO(mikeneuder): consider if we should use >1 optimisticBlockProcessors. | |||
go api.startOptimisticBlockProcessor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimisticBlockProcessors
are extremely lightweight, right? Why have more than 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it depends how many optimistic blocks we get. startOptimisticBlockProcessor
waits for the response of the block simulation from the prio-load-balancer, so if we only have one goroutine running this we are processing all the optimistic blocks serially. i thought it might be desirable to have a few goroutines sending the simulation requests because we have multiple validation nodes running. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depends how many optimistic blocks we get
We should expect to get hammered with optimistic blocks, partly because all the builders will want to connect to the ultra sound relay, but also partly because we can relax limit QoS limits (like max 2 blocks/sec).
if we only have one goroutine running this we are processing all the optimistic blocks serially
My naive understanding was that startOptimisticBlockProcessor
calls simulateBlock
, which calls api.blockSimRateLimiter.send
, which itself uses parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is the call path where we call api.BlockSimRateLimiter.send. that function can handle a bunch of concurrent blocks, but it still waits on the response from the simulation for the single block that we sent. so if we only have a single go startOptimisticBlockProcessor()
I think we are only processing one block from that channel at a time.
e.g.,
- we receive a block from the channel
- we call
api.BlockSimRateLimiter.send
- we wait until send returns to get the simulation error
whereas we could/should have multiple go routines listening on the channel, and concurrent calls to api.BlockSimRateLimiter.send
. maybe i am missing something though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrent calls to
api.BlockSimRateLimiter.send
Ok, agreed we should have concurrent calls to api.BlockSimRateLimiter.send
:) I don't have a particularly informed opinion as to the most natural way to do it here.
…rStatusCode and other variants
…e of interating with the block builder table specifically
OK another rather large structural change in: 3a37075
|
… is being simulated
there are a few potential paths we could take to set up an optimistic relay. below are two that i considered:
Option 1. Leave the builder status as is (
HighPrio
,LowPrio
,Blacklisted
). Modify the relay logic to only run block simulation forLowPrio
builders, thus redefiningHighPrio
as optimistic. Thus when we call theprio-load-balancer
(https://github.com/flashbots/prio-load-balancer) we only ever use theLowPrio
queue.Pros
Cons
LowPrio
andHighPrio
builders who are in the pessimistic mode.Option 2. Add a new builder status called
Optimisitic
. Thus we effectively have 4 builder statuses:Optimistic > HighPrio > LowPrio > Blacklisted
.Pros
HighPrio
, which has a different semantic meaning in https://github.com/flashbots/prio-load-balancer withOptimisitic
, which is a new concept we introduce.Cons
Based on the above, I propose Option 2 for its flexiblity.
initial plan for changes required to implement Option 2:
Optimistic
.handleSubmitNewBlock
Builder API handler to skip the block simulation for blocks from builders who are inOptimistic
state. these blocks will be pushed into anOptimisticBlock
buffered channel where their validity will be checked asynchronously. If any of them turn out to be invalid, we will demote that block builder toLowPrio
status (or could even blacklist them if we want to be conservative).handleGetHeader
Proposer API handler to check for best bid block (could do this in-band or through a similar buffered channel mechanism as we do for all incoming blocks). if it turns out that the winning bid was an invalid block for one of our proposers (and they ended up publishing it), we need to reimburse them using the collateral posted by the builder of the invalid block (still need to flesh this out a bit).open questions: